When pg_rewind success, the database can't startup - Mailing list pgsql-bugs

From hemin
Subject When pg_rewind success, the database can't startup
Date
Msg-id D73AFF2B-7325-4047-A325-F4B70828023B@ww-it.cn>+554133D5E253EE07
Whole thread Raw
List pgsql-bugs

Dear PGer:

       I use pg_rewind to avoid the WAL diverged success, but the database can’t startup, and output error “requested timeline 3 does not contain minimum recovery point 0/DB35BE80 on timeline 1”. Fallow is the detail.

       Thanks !

 

Problem Description:

       There is a primary standby cluster with async replication, when large data inserting into the primary node, we stop the database by hand. Then promote the standby node to be new primary node and insert new data into it.  Finally use pg_rewind to avoid WAL diverged success, but the node can not to be startup with fallow error:

       “2018-06-06 14:40:18.686 CST [2687] FATAL:  requested timeline 3 does not contain minimum recovery point 0/DB35BE80 on timeline 1

          2018-06-06 14:40:18.686 CST [2686] LOG:  startup process (PID 2687) exited with exit code 1”

 

 

Environment:  primary standby cluster with async replication, the database version is postgresql-10

 

Primary Node Info:

       System: centos 6, IP:10.9.5.21, port 5410

Standby Node Info:

       System: centos 6, IP:10.9.5.22, port: 5410

 

 

Reproduce Step:

 

(1) Init environment:    Create a primary standby cluster with async replication, and add access role in pg_hba.conf which tool pg_rewind will be use;

(2) Primary Node:         insert 1,500,000 rows data into database use pgbench:

       pgbench -i -s 15 postgres

(3) Primary Node:         when pgbench is insert end, and begin vacuum the database, we stop the database by hand:

       pg_ctl -D $PGDATA stop

(4) Standby Node:               promote the standby node to be primary:

       pg_ctl -D $PGDATA promote

(5) Standby Node:        inset 3,000,000 rows data into database use pgbench to:

       pgbench -i -s 30 postgres

(6) Primary Node:         use pg_rewind to avoid WAL diverged,:

       pg_rewind --target-pgdata='/var/lib/pgsql/10/data' --source-server='host=10.9.5.22 port=5410 dbname=postgres user=postgres password=xxx’

      

       servers diverged at WAL location 0/AEEE94D0 on timeline 1

       rewinding from last common checkpoint at 0/AEEE9460 on timeline 1

       Done!

(7) Primary Node:         startup failed:

       pg_ctl -D $PGDATA start

 

       waiting for server to start....2018-06-06 14:40:18.194 CST [2686] LOG:  listening on IPv4 address "0.0.0.0", port 5410

       2018-06-06 14:40:18.194 CST [2686] LOG:  listening on IPv6 address "::", port 5410

       2018-06-06 14:40:18.256 CST [2686] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5410"

       2018-06-06 14:40:18.372 CST [2687] LOG:  database system was interrupted while in recovery at log time 2018-06-06 14:12:45 CST

       2018-06-06 14:40:18.372 CST [2687] HINT:  If this has occurred more than once some data might be corrupted and you might need to choose an earlier        recovery target.

       2018-06-06 14:40:18.686 CST [2687] LOG:  entering standby mode

       2018-06-06 14:40:18.686 CST [2687] FATAL:  requested timeline 3 does not contain minimum recovery point 0/DB35BE80 on timeline 1

       2018-06-06 14:40:18.686 CST [2686] LOG:  startup process (PID 2687) exited with exit code 1

       2018-06-06 14:40:18.686 CST [2686] LOG:  aborting startup due to startup process failure

       2018-06-06 14:40:18.690 CST [2686] LOG:  database system is shut down

       stopped waiting

       pg_ctl: could not start server

       Examine the log output.

 

 

何敏

 

Call: 185.0821.2027 | Fax: 028.6143.1877 | Web: w3.ww-it.cn

成都文武信息技有限公司|ChengDu WenWu Information Technology Inc.|WwIT

地址: 成都高新区天府件园B7611 |邮编:610041

pgsql-bugs by date:

Previous
From: Michael Paquier
Date:
Subject: Re: psql crashes found when executing slash commands
Next
From: Flo Rance
Date:
Subject: Re: BUG #15240: JDBC driver sometimes hangs on copy out; suspect Json